Goto

Collaborating Authors

 health literacy


DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge

arXiv.org Artificial Intelligence

Discharge communication is a critical yet underexplored component of patient care, where the goal shifts from diagnosis to education. While recent large language model (LLM) benchmarks emphasize in-visit diagnostic reasoning, they fail to evaluate models' ability to support patients after the visit. We introduce DischargeSim, a novel benchmark that evaluates LLMs on their ability to act as personalized discharge educators. DischargeSim simulates post-visit, multi-turn conversations between LLM-driven DoctorAgents and PatientAgents with diverse psychosocial profiles (e.g., health literacy, education, emotion). Interactions are structured across six clinically grounded discharge topics and assessed along three axes: (1) dialogue quality via automatic and LLM-as-judge evaluation, (2) personalized document generation including free-text summaries and structured AHRQ checklists, and (3) patient comprehension through a downstream multiple-choice exam. Experiments across 18 LLMs reveal significant gaps in discharge education capability, with performance varying widely across patient profiles. Notably, model size does not always yield better education outcomes, highlighting trade-offs in strategy use and content prioritization. DischargeSim offers a first step toward benchmarking LLMs in post-visit clinical education and promoting equitable, personalized patient support.


Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations

arXiv.org Artificial Intelligence

Limited access to mental healthcare, extended wait times, and increasing capabilities of Large Language Models (LLMs) has led individuals to turn to LLMs for fulfilling their mental health needs. However, examining the multi-turn mental health conversation capabilities of LLMs remains under-explored. Existing evaluation frameworks typically focus on diagnostic accuracy and win-rates and often overlook alignment with patient-specific goals, values, and personalities required for meaningful conversations. To address this, we introduce MedAgent, a novel framework for synthetically generating realistic, multi-turn mental health sensemaking conversations and use it to create the Mental Health Sensemaking Dialogue (MHSD) dataset, comprising over 2,200 patient-LLM conversations. Additionally, we present MultiSenseEval, a holistic framework to evaluate the multi-turn conversation abilities of LLMs in healthcare settings using human-centric criteria. Our findings reveal that frontier reasoning models yield below-par performance for patient-centric communication and struggle at advanced diagnostic capabilities with average score of 31%. Additionally, we observed variation in model performance based on patient's persona and performance drop with increasing turns in the conversation. Our work provides a comprehensive synthetic data generation framework, a dataset and evaluation framework for assessing LLMs in multi-turn mental health conversations.


LLM on FHIR -- Demystifying Health Records

arXiv.org Artificial Intelligence

Objective: To enhance health literacy and accessibility of health information for a diverse patient population by developing a patient-centered artificial intelligence (AI) solution using large language models (LLMs) and Fast Healthcare Interoperability Resources (FHIR) application programming interfaces (APIs). Materials and Methods: The research involved developing LLM on FHIR, an open-source mobile application allowing users to interact with their health records using LLMs. The app is built on Stanford's Spezi ecosystem and uses OpenAI's GPT-4. A pilot study was conducted with the SyntheticMass patient dataset and evaluated by medical experts to assess the app's effectiveness in increasing health literacy. The evaluation focused on the accuracy, relevance, and understandability of the LLM's responses to common patient questions. Results: LLM on FHIR demonstrated varying but generally high degrees of accuracy and relevance in providing understandable health information to patients. The app effectively translated medical data into patient-friendly language and was able to adapt its responses to different patient profiles. However, challenges included variability in LLM responses and the need for precise filtering of health data. Discussion and Conclusion: LLMs offer significant potential in improving health literacy and making health records more accessible. LLM on FHIR, as a pioneering application in this field, demonstrates the feasibility and challenges of integrating LLMs into patient care. While promising, the implementation and pilot also highlight risks such as inconsistent responses and the importance of replicable output. Future directions include better resource identification mechanisms and executing LLMs on-device to enhance privacy and reduce costs.


ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to Improve Health Literacy and Communication in Pediatric Populations and Beyond

arXiv.org Artificial Intelligence

Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as "Explain" and "What is (are)" were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard's hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond.


Study: ChatGPT has potential to help cirrhosis, liver cancer patients

#artificialintelligence

A new study by Cedars-Sinai investigators describes how ChatGPT, an artificial intelligence (AI) chatbot, may help improve health outcomes for patients with cirrhosis and liver cancer by providing easy-to-understand information about basic knowledge, lifestyle and treatments for these conditions. The findings, published in the peer-reviewed journal Clinical and Molecular Hepatology, highlights the AI system's potential to play a role in clinical practice. "Patients with cirrhosis and/or liver cancer and their caregivers often have unmet needs and insufficient knowledge about managing and preventing complications of their disease," said Brennan Spiegel, MD, MSHS, director of Health Services Research at Cedars-Sinai and co-corresponding author of the study. "We found ChatGPT--while it has limitations--can help empower patients and improve health literacy for different populations." Patients diagnosed with liver cancer and cirrhosis, an end-stage liver disease that is also a major risk factor for the most common form of liver cancer, often require extensive treatment that can be complex and challenging to manage.


The Atrial Fibrillation Health Literacy Information Technology Trial: Pilot Trial of a Mobile Health App for Atrial Fibrillation

#artificialintelligence

Background: Atrial fibrillation (AF) is a common arrhythmia that adversely affects health-related quality of life (HRQoL). We conducted a pilot trial of individuals with AF using a smartphone to provide a relational agent as well as rhythm monitoring. We employed our pilot to measure acceptability and adherence and to assess its effectiveness in improving HRQoL and adherence. Objective: This study aims to measure acceptability and adherence and to assess its effectiveness to improve HRQoL and adherence. Methods: Participants were recruited from ambulatory clinics and randomized to a 30-day intervention or usual care. We collected baseline characteristics and conducted baseline and 30-day assessments of HRQoL using the Atrial Fibrillation Effect on Quality of Life (AFEQT) measure and self-reported adherence to anticoagulation. The intervention consisted of a smartphone-based relational agent, which simulates face-to-face counseling and delivered content on AF education, adherence, and symptom monitoring with prompted rhythm monitoring. We compared differences in AFEQT and adherence at 30 days, adjusted for baseline values. We quantified participantsโ€™ use and acceptability of the intervention. Results: A total of 120 participants were recruited and randomized (59 to control and 61 to intervention) to the pilot trial (mean age 72.1 years, SD 9.10; 62/120, 51.7% women). The control group had a 95% follow-up, and the intervention group had a 93% follow-up. The intervention group demonstrated significantly higher improvement in total AFEQT scores (adjusted mean difference 4.5; 95% CI 0.6-8.3; P=.03) and in daily activity (adjusted mean difference 7.1; 95% CI 1.8-12.4; P=.009) compared with the control between baseline and 30 days. The intervention group showed significantly improved self-reported adherence to anticoagulation therapy at 30 days (intervention 3.5%; control 23.2%; adjusted difference 16.6%; 95% CI 2.8%-30.4%; P<.001). Qualitative assessments of acceptability identified that participants found the relational agent useful, informative, and trustworthy. Conclusions: Individuals randomized to a 30-day smartphone intervention with a relational agent and rhythm monitoring showed significant improvement in HRQoL and adherence. Participants had favorable acceptability of the intervention with both objective use and qualitative assessments of acceptability.


Toward Improving Health Literacy in Patient Education Materials with Neural Machine Translation Models

arXiv.org Artificial Intelligence

Health literacy is the central focus of Healthy People 2030, the fifth iteration of the U.S. national goals and objectives. People with low health literacy usually have trouble understanding health information, following post-visit instructions, and using prescriptions, which results in worse health outcomes and serious health disparities. In this study, we propose to leverage natural language processing techniques to improve health literacy in patient education materials by automatically translating illiterate languages in a given sentence. We trained and tested the state-of-the-art neural machine translation (NMT) models on a silver standard training dataset and a gold standard testing dataset, respectively. The experimental results showed that the Bidirectional Long Short-Term Memory (BiLSTM) NMT model outperformed Bidirectional Encoder Representations from Transformers (BERT)-based NMT models. We also verified the effectiveness of NMT models in translating health illiterate languages by comparing the ratio of health illiterate language in the sentence. The proposed NMT models were able to identify the correct complicated words and simplify into layman language while at the same time the models suffer from sentence completeness, fluency, readability, and have difficulty in translating certain medical terms.


On Curating Responsible and Representative Healthcare Video Recommendations for Patient Education and Health Literacy: An Augmented Intelligence Approach

arXiv.org Artificial Intelligence

Studies suggest that one in three US adults use the Internet to diagnose or learn about a health concern. However, such access to health information online could exacerbate the disparities in health information availability and use. Health information seeking behavior (HISB) refers to the ways in which individuals seek information about their health, risks, illnesses, and health-protective behaviors. For patients engaging in searches for health information on digital media platforms, health literacy divides can be exacerbated both by their own lack of knowledge and by algorithmic recommendations, with results that disproportionately impact disadvantaged populations, minorities, and low health literacy users. This study reports on an exploratory investigation of the above challenges by examining whether responsible and representative recommendations can be generated using advanced analytic methods applied to a large corpus of videos and their metadata on a chronic condition (diabetes) from the YouTube social media platform. The paper focusses on biases associated with demographic characters of actors using videos on diabetes that were retrieved and curated for multiple criteria such as encoded medical content and their understandability to address patient education and population health literacy needs. This approach offers an immense opportunity for innovation in human-in-the-loop, augmented-intelligence, bias-aware and responsible algorithmic recommendations by combining the perspectives of health professionals and patients into a scalable and generalizable machine learning framework for patient empowerment and improved health outcomes.


Artificial intelligence tool could increase patient health literacy, study shows

#artificialintelligence

A federal rule that requires health care providers to offer patients free, convenient and secure electronic access to their personal medical records went into effect earlier this year. However, providing patients with access to clinician notes, test results, progress documentation and other records doesn't automatically equip them to understand those records or make appropriate health decisions based on what they read. "Medicalese" can trip up even the most highly educated layperson, and studies have shown that low health literacy is associated with poor health outcomes. University of Notre Dame researcher John Lalor, an assistant professor of information technology, analytics and operations at the Mendoza College of Business, is part of a team working on a web-based natural language processing system that could increase the health literacy of patients who access their records through a patient portal. NoteAid, a project based at the University of Massachusetts Amherst, conveniently translates medical jargon for health care consumers.


YouTube for Patient Education: A Deep Learning Approach for Understanding Medical Knowledge from User-Generated Videos

arXiv.org Machine Learning

YouTube presents an unprecedented opportunity to explore how machine learning methods can improve healthcare information dissemination. We propose an interdisciplinary lens that synthesizes machine learning methods with healthcare informatics themes to address the critical issue of developing a scalable algorithmic solution to evaluate videos from a health literacy and patient education perspective. We develop a deep learning method to understand the level of medical knowledge encoded in YouTube videos. Preliminary results suggest that we can extract medical knowledge from YouTube videos and classify videos according to the embedded knowledge with satisfying performance. Deep learning methods show great promise in knowledge extraction, natural language understanding, and image classification, especially in an era of patient-centric care and precision medicine.